Skip to content

Compile time path #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

thehiddenwaffle
Copy link
Contributor

@thehiddenwaffle thehiddenwaffle commented May 20, 2025

Here's the big one, lots of design choices to be made. I tried to keep everything as simple as possible for now but in the future I could definitely remove a couple hundred lines(ToTokens impls mostly) with declarative macros and once it stabilizes and has a full test suite I can work on better error reporting from the macro side and the FromPest side

I have tried to list out all the options that I have thought of, as well as the one that I think is best but I'm certainly not going to make those choices without presenting them to you and ensuring you agree. These are just my suggestions and I'm happy to expand on my thought process but ultimately it's your choice of course.

In this PR I have not integrated anything I created with the main functionality of this repository because there are still design choices to make, because its a 2600 line addition, and because it's barely tested.

Summary:

  • Added a new crate with a common ast called jsonpath-ast
    • Common structures shared between syn and pest, allowing them to parse into the same things
    • All syn related implementations are locked behind a crate feature "compiled-path", ensuring that users who do not need compiled paths via the json_path! macro will not suffer longer build times
      • Work could still be done to completely eliminate syn and other macro related dependencies when "compiled-path" is toggled off, currently not possible
    • Includes some helper types(PestIgnoredPunctuated, PestLiteralWithoutRule) for when pest checks that punctuation exist but doesn't parse it, so syn still needs to check that it exists, and FromPest needs to ignore it(because it already guaranteed it) during pest::Rules -> AST conversion
    • Current ToTokens implementation has each struct generate itself as tokens, ie: Main writes out "Main::new(....inners)"
      • Could be changed to be anything at all(see Macro Output below)
  • Added new proc macro crate jsonpath-rust-impl
    • Provides the json_path! proc macro, depends on jsonpath-ast with feature "compiled-path" enabled
    • Has a test suite I generated using the basic.json from the rfc and trybuild. I think this would be a good approach, but I'm open to better ideas and therefore only did basic.json not the whole RFC
      • Even with trybuild there are still choices, see Testing below
    • Not everything can be parsed by syn at compile time, see Macro Limitations below

Macro Limitations:

  • single quote strings break the macro parser because in rust only chars are single quoted
    • rust has r"# #" strings so in my opinion this is no impact, we can provide a sed regex?
  • The macro does not care bout extraneous whitespace, but this just means that more strings are valid paths without changing any functionality.
  • due to constraints rust has put on it's identifiers, emoji in member_name_shorthand is not allowed(works in both string literals and bracket field accessjson_query!( $["☺"] ))
  • we could solve these if we changed the input to a string literal, however:
    • this would make error messages and syntax highlighting severely more complex to implement, and currently syn is giving them to us absolutely free(in my opinion the limitations are acceptable and we should not have the macro take strings because any path is still buildable with some tweaks like changing to bracket field access or converting strings)

Macro Output:

  • Currently, the macro just outputs AST literals, though of course it could be made to output anything, so either:
    • We implement Query for the AST nodes and just delete the existing types
    • Provide impls like From<JPQueryAST> for JPQuery for all existing structs then just have the macro call .into()
    • We have the macro output the current types(my least favorite, probably error prone and somewhat redundant)

Testing:

  • If using trybuild, we can either:
    • Create files dynamically from rfc test case file and then delete afterward to ensure accuracy and account for changes in the suite?
    • Just statically build the suite(I made a quick script), how often could it even change?(to me this feels like the answer)

Introduces `jsonpath-rust-impl` crate as a procedural macro for JSONPath parsing and validation. Includes AST definitions, Pest-based grammar, and integration with the main library. Starts refining error handling and debugging within parsing logic.
Introduce `parse_terminated_nonempty` to enforce nonempty parsing for `PestIgnoredPunctuated`. Updated relevant AST nodes to use this stricter parsing function, ensuring better validation and error handling for empty input cases.
These tests validate that the `json_query!` macro correctly rejects various malformed JSONPath expressions. The added cases cover scenarios like empty segments, unexpected tokens, and improper syntax for selectors.
@thehiddenwaffle

This comment was marked as resolved.

@besok
Copy link
Owner

besok commented May 21, 2025

Thank you for the pr. I will hapily take a look but closer to the weekend (too much stuff now)

@thehiddenwaffle
Copy link
Contributor Author

No problem, it's pretty huge

Copy link
Owner

@besok besok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you complement the description in the general readme with the changes:

  • what feature means
  • how to use it
    as soon as the pr is ready to merge.

jp_query = {root ~ segments}
segments = {(S ~ segment)*}
segments = !{(S ~ segment)*}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as i get, ! will exclude it from ast as a sep node and include into the parent node instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna quote the pest docs because I'm not the best at explaining and I'm relatively new to pest:

Compound-atomic ($)
Compound-atomic are identical to atomic rules with the exception that rules called by them are not forbidden from generating token pairs.

Non-atomic (!)

Non-atomic are identical to normal rules with the exception that they stop the cascading effect of atomic and compound-atomic rules.

So all the not-atomic does in this particular case is override the compound atomic for the segment rule and all downstream rules, allowing for everything except for the main rule to not be atomic. The main rule being atomic makes it so that leading and trailing white space on the main rule will be rejected by the parser, allowing us to remove the validation logic in parser and allowing the FromPest impl to be more resilient. So adding the ! to the child rules allows them to have leading/trailing white space again.

I guess I should have also added ! to the root but in this case root does not parse to rules so it doesn't make a difference functionally.

@besok
Copy link
Owner

besok commented May 24, 2025

First of all, I have to say, this is impressive work. Thank you.
I would definitely propose to finalize the pr. It has a lot of value.

Work could still be done to completely eliminate syn and other macro related dependencies when "compiled-path" is toggled off, currently not possible

I dont think if you need to eliminate syn unless you see the specific reasons for that :)

Has a test suite I generated using the basic.json from the rfc and trybuild. I think this would be a good approach, but I'm open to better ideas and therefore only did basic.json not the whole RFC

At first glance, it looks good, though the tests now takes around 25-30 seconds .

single quote strings break the macro parser because in rust only chars are single quoted

I don't think it is a problem, just need to point it out explicitly in the docs.

rust has r"# #" strings so in my opinion this is no impact, we can provide a sed regex?

precisely

The macro does not care bout extraneous whitespace, but this just means that more strings are valid paths without changing any functionality.

Don't think, it is a problem, but worth noting in docs

due to constraints rust has put on it's identifiers, emoji in member_name_shorthand is not allowed(works in both string literals and bracket field accessjson_query!( $["☺"] ))

it is currently a problem in the current grammar as well, so also fine.

we could solve these if we changed the input to a string literal, however ...

I would suggest to keep the existing format.

Currently, the macro just outputs AST literals, though of course it could be made to output anything, so either

I would think about From<JPQueryAST> for JPQuery as the first choice but up to you

If using trybuild, we can either:

I would say, statically build is fine

@thehiddenwaffle
Copy link
Contributor Author

Thank you for the complement and for taking the time! This is actually the first open source rust repo I've interacted with so thank you for your patience with my lack of experience 😄

Work could still be done to completely eliminate syn and other macro related dependencies when "compiled-path" is toggled off, currently not possible

I dont think if you need to eliminate syn unless you see the specific reasons for that :)

That's fair, I've heard complaints of long compile times with syn being a dependency so I figured I'd offer the option, though we do only need one or two of the features syn uses and it's aggressively feature gated so maybe it's fine. Either way the 25 sec test is a much greater priority 😅

Has a test suite I generated using the basic.json from the rfc and trybuild. I think this would be a good approach, but I'm open to better ideas and therefore only did basic.json not the whole RFC

At first glance, it looks good, though the tests now takes around 25-30 seconds .

Yeah... I didn't even think about that I was just relieved I had finally got them all passing when I was making this. I'm looking into just having the test read them all, and concatenate both all the tests and all the .stderr files together, write and .gitignore the result, then run the compile failed on the one file, hopefully that speeds things up

single quote strings break the macro parser because in rust only chars are single quoted
{next 5}

I will make subsequent commits for docs pointing out these limitations.

Currently, the macro just outputs AST literals, though of course it could be made to output anything, so either

I would think about From for JPQuery as the first choice but up to you

Sounds good, I can add these in a subsequent PR that's geared more towards "time to actually use this stuff" or I could put it here, whatever you'd like

…erals so that rust str->lit parsing doesn't crash on extraneous spaces.
Refactor compile tests by consolidating failing cases into a single `compile_but_expect_err.rs` file and updating test structure. Removed redundant individual files to streamline file management while retaining all test case coverage. Also adjusted parsing utilities to better handle edge cases and ensure clarity.
@thehiddenwaffle
Copy link
Contributor Author

thehiddenwaffle commented May 26, 2025

tests now take 4.5 sec to both build and run. Moving onto docs soon(possibly tomorrow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants